\documentclass{standalone}
\usepackage{placeins}
\begin{document}

	\section{Calling H2O Algorithms}

	This section describes how to call H2O algorithms and feature transformers from Sparkling Water. In Scala and Python, Sparkling Water
	exposes H2O algorithms via its own API. In R, we still need to use H2O's R API.

	Following feature transformers are exposed in Sparkling Water:

	\begin{itemize}
		\item H2OWord2Vec
		\item H2OTargetEncoder
	\end{itemize}

	Currently, the following algorithms are exposed:

	\begin{itemize}
		\item DeepLearning
		\item DRF
		\item GBM
		\item XGBoost
		\item AutoML
		\item GridSearch
		\item KMeans
		\item GLM
		\item GAM
		\item CoxPH
		\item Isolation Forest
	\end{itemize}

	In the exposed algorithms above, it is up on the users to decide whether they want to run classification or
	regression problem using the given algorithm. H2O decides whether it will run classification or regression based
	on the type of the label column. If it is categorical, classification will be performed, otherwise regression.

	If you do not want to be worried about this, we also expose specific regressors and classifiers to make this more
	explicit. For example, if you decide to use H2OAutoMLRegressor, you can be sure that the algorithm will do regression
	and you do not need to worry about the type of the label column. These wrappers automatically convert the label
	column to required type for the given problem. The following wrappers exist:

	\begin{itemize}
		\item \texttt{H2OAutoMLClassifer} and \texttt{H2OAutoMLRegressor}
		\item \texttt{H2ODeepLearningClassifier} and \texttt{H2ODeepLearningRegressor}
		\item \texttt{H2ODRFClassifier} and \texttt{H2ODRFRegressor}
		\item \texttt{H2OGBMClassifier} and \texttt{H2OGBMRegressor}
		\item \texttt{H2OGLMClassifier} and \texttt{H2OGLMRegressor}
		\item \texttt{H2OXGBoostClassifier} and \texttt{H2OOXGBoostRegressor}
		\item \texttt{H2OGAMClassifier} and \texttt{H2OGAMRegressor}
	\end{itemize}

	First, let's create H2OContext:

	\begin{itemize}
		\item \textbf{Scala} \begin{lstlisting}[style=Scala]
import ai.h2o.sparkling._
import java.net.URI
val hc = H2OContext.getOrCreate()
		\end{lstlisting}
		\item \textbf{Python} \begin{lstlisting}[style=Python]
from pysparkling import *
hc = H2OContext.getOrCreate()
		\end{lstlisting}
	\end{itemize}

	Parse the data using H2O and convert them to Spark Frame:

	\begin{itemize}
		\item \textbf{Scala} \begin{lstlisting}[style=Scala]
val frame = new H2OFrame(new URI("https://raw.githubusercontent.com/h2oai/sparkling-water/master/examples/smalldata/prostate/prostate.csv"))
val sparkDF = hc.asSparkFrame(frame).withColumn("CAPSULE", $"CAPSULE" cast "string")
val Array(trainingDF, testingDF) = sparkDF.randomSplit(Array(0.8, 0.2))
		\end{lstlisting}
		\item \textbf{Python} \begin{lstlisting}[style=Python]
import h2o
frame = h2o.import_file("https://raw.githubusercontent.com/h2oai/sparkling-water/master/examples/smalldata/prostate/prostate.csv")
sparkDF = hc.asSparkFrame(frame)
sparkDF = sparkDF.withColumn("CAPSULE", sparkDF.CAPSULE.cast("string"))
[trainingDF, testingDF] = sparkDF.randomSplit([0.8, 0.2])
		\end{lstlisting}
	\end{itemize}

	Train the model. You can configure all the available arguments using provided setters, such as the label column:

	\begin{itemize}
		\item \textbf{Scala} \begin{lstlisting}[style=Scala]
import ai.h2o.sparkling.ml.algos.H2OGBM
val estimator = new H2OGBM()
	.setLabelCol("CAPSULE")
val model = estimator.fit(trainingDF)
		\end{lstlisting}
		\item \textbf{Python} \begin{lstlisting}[style=Python]
from pysparkling.ml import H2OGBM
estimator = H2OGBM(labelCol = "CAPSULE")
model = estimator.fit(trainingDF)
		\end{lstlisting}
	\end{itemize}

	Instead of calling the generic wrapper, we can do regression explicitly as:

	\begin{itemize}
		\item \textbf{Scala} \begin{lstlisting}[style=Scala]
import ai.h2o.sparkling.ml.algos.H2OGBMRegressor
val estimator = new H2OGBMRegressor()
	.setLabelCol("CAPSULE")
val model = estimator.fit(trainingDF)
		\end{lstlisting}
		\item \textbf{Python} \begin{lstlisting}[style=Python]
from pysparkling.ml import H2OGBMRegressor
estimator = H2OGBMRegressor(labelCol = "CAPSULE")
model = estimator.fit(trainingDF)
		\end{lstlisting}
	\end{itemize}

	or classification explicitly as:

	\begin{itemize}
		\item \textbf{Scala} \begin{lstlisting}[style=Scala]
import ai.h2o.sparkling.ml.algos.H2OGBMClassifier
val estimator = new H2OGBMClassifier()
	.setLabelCol("CAPSULE")
val model = estimator.fit(trainingDF)
		\end{lstlisting}
		\item \textbf{Python} \begin{lstlisting}[style=Python]
from pysparkling.ml import H2OGBMClassifier
estimator = H2OGBMClassifier(labelCol = "CAPSULE")
model = estimator.fit(trainingDF)
		\end{lstlisting}
	\end{itemize}

	Run Predictions:

	\begin{itemize}
		\item \textbf{Scala} \begin{lstlisting}[style=Scala]
model.transform(testingDF).show(false)
		\end{lstlisting}
		\item \textbf{Python} \begin{lstlisting}[style=Python]
model.transform(testingDF).show(truncate = False)
		\end{lstlisting}
	\end{itemize}

	The code is identical to the rest of the exposed algorithms.

	In the case of AutoML, after you have fit the model, you can obtain the leaderboard frame using the
	\texttt{estimator.getLeaderBoard()} method.

	In case of Grid, after you have fit the model, you can obtain the grid models, their parameters and metrics using
	\texttt{estimator.getGridModels()}, \texttt{estimator.getGridModelsParams()} and \\
	\texttt{estimator.getGridModelsMetrics()}.
\end{document}
